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(54) Method of replication-based garbage collection in a multiprocessor system 



(57) Improved method of replication4)ased garbage 
collection in a multiprocessing system comprising a plu- 
rality of processors, a memory divided into a current 
area (from-space) used by the processors during cur- 
rent program execution and a reserved area (to-space), 
and at least a garbage collector for performing, when 
necessary, a garbage collection consisting in flipping 
the roles of the current area and reserved area after all 
the live objects stored in cun-ent area have been copied 
into the reserved area and for reclaiming the current 
area after the flipping operation. Several program 
threads (mutators) are curently running in parallel and 
the gartage collector performs the garbage collection in 
parallel with the program threads, the flipping operation 
being perfamed after the program threads have been 
stopped and the gart:)age collection has been conv 
pleted. The method comprises the steps of storing, dur- 
ing normal program execution, a record in a local buffer 
allocated to each program thread each time this one 
updates a memory location, and adding this local buffer 
when full to a global list of buffers using a first wait-free 
synchronization operation, and, during garbage collec- 
tion, renrraving the local buffers one by one from the glo- 
bal list of buffers using a second wait-free 
synchronization operation, and looping over records in 
each removed local buffer and copying the updated 
memory locations into the reserved area until the global 
list is empty. 
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Description 
Technical field 

[0001] The present invention relates generally to a 
technique for automatically reclaiming the memory 
space which Is occupied iDy data objects referred as gar- 
bage that the running program will not access any 
longer and relates particularly to a method of replica- 
tion-based garbage collection In a multiprocessor envi- 
ronment. 

Background 

[0002] Garbage collection is the automatic reclama- 
tion of computer storage. While in many systems pro- 
grammers must explicitly reclaim heap memory at some 
point in the program, by using a ( ( free ) ) or < ( dispose 
)) statement, garbage collected systen^ free the pro- 
grammer from this burden. The garbage collectors 
function is to find data objects that are no longer in use 
and make their space available for reuse by the running 
program. An object is considered garbage, and subject 
to reclamation, if it is not reachable by the running pro- 
gram via any path of pointer traversals. Live (potentially 
reachable) objects are preserved by the collector, 
ensuring that the program can never traverse a ( < dan- 
gling pointer ) ) into a deallocated object. 
[0003] The basic functioning of a gaibage collector 
consists, abstractly speaking, of two parts : 

1. Distinguishing the live objects from the garbage 
in some way, or garbage detection, and 

2. Reclaiming the garbage objects' storage, so that 
the running program can use it. 

[0004] In practice, these two phases may be function- 
ally or temporally interleaved, and the reclamation tech- 
nique is strongly dependent on the garbage detection 
technique. 

[0005] In general, the garbage collectors use a < ( live- 
ness >) criterion that is somewhat more conservative 
than those used by other systems. This criterion is 
defined in terms of a root set and reachability from 
these roots. At the point when garbage collection 
occurs, all globally visible variables of active procedures 
are considered live, and so are the local variables of any 
active procedures. The root set therefore consists of the 
global variables, local variables in the activation stack, 
and any registers used by active procedures. Heap 
objects directly reachable from any of these variables 
could be accessed by the running program, so they 
nrust be presented. In addition, since the program might 
traverse pointers from those objects to reach other 
objects, any object reachable from a live object is also 
live. Thus, the set of live objects is simply the set of 
objects on any directed path of pointers from the roots. 
[0006] Any object that is not reachable from the root 



set is garbage, i.e.. useless, because there is no legal 
sequence of program actions that wouM allow the pro- 
gram to reach that object Garbage objects therefore 
cannot affect the future course of the computation, and 

5 their space may be safely reclaimed. 

[0007] Given the basic two-part operation of a gar- 
bage collector, several variations are possible. The first 
part, that is distinguishing live objects from garbage, 
may be done by several methods. Among them, copying 

10 garbage oollectfon does not really collect gartiage. 
Rather, it moves all of the live objects into one area of 
the heap (space in the memory where all objects are 
held) whereas the area of reclaimed objects can be 
reused for new objects. 

15 [0008] A very comnfKDn kind of copying garbage collec- 
tion is the semi-space collector. In this scheme, the 
space devoted to the heap is subdivided into two parts, 
a current area or from-space and a reserved area or to- 
space. During normal program execution, only the from- 

20 space is in use. When the running program requests an 
allocation that will not fit in the unused area of the from- 
space. the program is stopped and the copying garbage 
collector is called to reclaim space. The roles of the cur- 
rent area and reserved area are flipped, that is all the 

25 live data are copied from the from-space to the to- 
space. 

[0009] Once the copying is completed, the to-space is 
made the cunent area and program execution is 
resumed. Thus, the roles of the two spaces are 

30 reversed each time the garbage collector is invoked. 
[0010] The technique of replication-based garbage 
collection is to let the collector work in parallel to the 
program threads or mutators. In contrast to previous 
copying garbage collection algorithms, replication- 

35 based garbage collection delays the flip until the end of 
the collection cycle. While the mutators teep running 
and operate on from-space, the collector replicates the 
live objects from the from-space to the to-space. Finally, 
In the flip stage, the mutators are stopped and then 

40 roots are updated to point to the replicated objects in the 
to-space. 

[001 1 ] But, while the replication is executed, objects in 
from-space keep on changing and this has to be 
reflected in the to-space replica. In order to make the 

45 replica consistent, the mutators log ail modifications to a 
mutation log. The collector flips after it has cleared the 
mutation log, that is applied each update on the replica. 
Really, the collector stops the mutator threads for a 
short pause during which the collector updates the 

50 mutator roots, and then flips the roles of from-space and 
to-space. 

[001 2] However, the above replication-based garbage 
collection is not suitable for a nfKxlern multiprocessor 
system wherein it is not guaranteed that the operations 
55 executed by one processor always appear in the same 
order in the view of another processor. Thus, it is possi- 
ble that the collector will see the update of a location 
only after it reads tiie update to the mutation log. From 
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the collector standpoint, this means that it might copy 
the contents of the tocat'on before the new value actu- 
ally appears in its view. As a consequence, the new rep- 
lica in to-space will contain an outdated value of the 
location which, furthermore, will never be updated. 

Summary of the Invention 

[001 3] Accordingly, the main object of the invention is 
to provide a new method of replication-based garbage 
collection which can be run in a multiprocessor system 
without the risk that the contents of memory locations 
are replicated from the current area to the reserved area 
white their updates have rK>t been taken into considera- 
tion. 

[001 4] Therefore, the invention relates to an improved 
method of replication-based gart)age collection in a 
multiprocessing system comprising a plurality of proc- 
essors, a memory divided into a cun'ent area (from- 
space) used by the processors during current program 
execution and a reserved area (to-space), and at least 
one gart}age collector for performing, when necessary, 
a garbage collection consisting in flipping the roles of 
the current area and reserved area after all the live 
objects stored in current area have been copied into the 
reserved area and for reclaiming the current area after 
the flipping operation. Several program threads (muta- 
tors) are currently running in parallel and the garbage 
collector performs the gartsage collection in parallel with 
the program threads, the flipping operation being per- 
formed after the program threads have been stopped 
and the garbage collection has been completed. The 
method of replication-based garbage collection com- 
prises the steps of storing, during normal program exe- 
cution, a record in a local buffer allocated to each 
program thread each time this program thread updates 
a memory location, and adding this local buffer when full 
to a global list of buffers using a first synchronization 
operation, and, during garbage collection, removing the 
local buffers one by one from the global list of buffers 
using a second synchronization operation, and looping 
over records in each removed local buffer and copying 
the updated memory locations into the reserved area 
until the global list is empty. 

Brief description of the drawings 

[001 5] The objects, characteristics and advantages of 
the invention wilt become clear from ttie following 
description given in reference to the accompanying 
drawings wherein : 

Fig. 1 represents a schemata block<liagram of the 
local buffers associated witii each processor in a 
multiprocessor system and the global list of the 
local buffers according to the method of tiie inven- 
tion. 



Fig. 2 is a flow chart representing the steps of 
updating memory locations and storing records in a 
local buffer by a mutator. 

5 Rg. 3 is a flow chart representing the different steps 
under the control of the collector during a collection 
cyde according to the method of the invention. 

Rg. 4 is a flow chart representing the different steps 
10 perlbnned by the collector for looping over records 
in a local buffer during a collection cycle according 
to the method of tiie invention. 

Detailed description of the Invention 

15 

[001 6] Referring to Figure 1 . tiie principle of the inven- 
tion is to associate a local buffer 10, 12 or 14 respec- 
tively to each one of tiie program threads 16, 18 or 20 
running in parallel. This local buffer is used by the pro- 

20 gram thread to store all its mutation records rather than 
storing tiiem directiy into tiie mutation log. as in tiie pre- 
vious metiiods for replication based garbage collection. 
Once a local buffer 10, 12 or 14 is filled with records, the 
mutator adds a pointer to a global list 22 which is an 

25 anray of pointers. Adding a pointer to global list 22 is 
done using a synchronization operation as explained 
below. When the garbage collection is peribrmed, the 
collector removes tiie pointers from the global list one 
by one using the same synchronization mechanism, 

30 and performs the needed updates on the replica as dic- 
tated by tiie buffer records. 

[001 7] Note that processes could be used instead of 
program threads to implement the invention. Such a 
process contains an address space and several 

35 ttireads, one of them being tiie main tiiread. Each 
tiiread (sometimes called tiiread of control) has its own 
stack, registers and program counter All tiireads share 
the memory space of tiie process. When a process witti 
no tiireads is run, alt ttie properties of a thread become 

40 tiie properties of tiie process. 

[001 8] The different steps of the method according to 
ttie invention are detailed in Rgures 2, 3 and 4. As illus- 
ti^ated in Rgure 2, after starting (30) ttie collection cycle, 
the program thread also called mutator updates a mem- 

45 ory location (changes its contents) (32). The record of 
ttiis update is stored in ttie associated local buffer (34). 
It is then detemiined whetiier the local buffer is full (36). 
If not. the updating operation is ended (38). If so. a 
memory coherence synchronization is first performed 

so (39). Then, the local buffer is added to ttie global list of 
buffers using synchronization such as a wait-free syn- 
chronization or any other appropriate synchronization 
as it is well known to tiiose skilled in tiie art since ttie 
global list is shared by all mutators and the collector 

55 (40). After the synchronization, a new local buffer is allo- 
cated to tiie mutator (41). 

[001 9] Note tiiat the memory coherence synchroniza- 
tion is required in view of the ( ( partial memory coher- 
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ence » meaning that when a program thread on one 
processor performs several write operations, the order 
of the updates may be different for a second program 
thread running on a different processor. When this inco- 
herence endangers the oonrectness of a concurrent pro- 5 
gram, the programmer must make sure that such an 
inconsistency does not occur at the sensitive spots in 
the program. Each platform offers its special instruction 
to settle the memory coherence. These instructions 
mean that any update operation that is performed w 
before the memory coherence synchronization instruc- 
tion is perceived by all threads as occumng before any 
update performed after the memory coherence syn- 
chronization. 

[0020] Note also that, although any appropriate syn- is 
chronization could be used for adding a local buffer to 
global list, a wait-free synchronization is preferable. 
Indeed, a wait-free synchronization operation is per- 
formed by a synchronization mechanism that works in a 
( < wait-free ) ) manner, that is without blocking the com-. 20 
puter that uses the instrucfion. Such an operation can 
be a compare and swap instruction including three 
parameters : address, compared-value and new-value. 
If the memory value for a given address matches the 
given compared-value, then the new-value is put into 25 
the location. The instruction returns a code indicating 
whether the comparison and setting were successful. 
The main feature of this instruction is that it is done 
atomicaliy. Namely, no parallel process can change the 
value at the same time that the compare and swap so 
instruction is executed. After the failure of such an 
instruction, the process may decide whether to try again 
or to execute another code after the failure. Conversely 
to a wait-free synchronization, a blocking synchroniza- 
tion is a synchronization which keeps the processor 35 
blocked until a certain event happens. Thus, with a 
blocking synchronization, a processor performing a 
work fed by another processor may decide to wait until 
a record is written into the shared list of records when 
this list is empty. In the invention, the wait-free synchro- 40 
nization guarantees that, if more than one mutator is 
modifying the glottal list by adding a local buffer, then 
the global list will not be corrupted and the changes will 
be reflected properly in the list. It must be noted that, on 
some platforms, a wait-free synchronization causes an 45 
implicit memory coherence synchronization. In such a 
case, only the wait-free synchronization of step 40 in 
Figure 2 is needed and step 39 does not exist. 
[0021] In parallel with the recording of location updat- 
ing by the mutators, the collector starts (42) buffer re^- so 
ing cycle as illustrated in Figure 3. It is first determined 
whether the global list is empty (44). If not, a buffer is 
removed from the global list (46) using a wait-free syn- 
chronization operation identical to the synchronization 
operation used for adding the buffers to the global list, ss 
Then, the collector goes over all records in the buffer 
and copies the changed values into the memory to- 
space (48). 



[0022] In case the global list of local buffers is empty, 
the collector stops all mutators (50) for finishing the col- 
lection. It verifies again whether the global is enpty (52) 
since one of the mutators may have added a buffer 
before stopping. If not, the collector performs again the 
operation of removing the buffers from the global list 
(54) and the operation of looping over records in the 
buffers to apply updates (56). When the global list has 
been emptied while the mutators have been stopped, 
the collector loops over all local buffers that have not yet 
been added to the global list (58). At last, the collector 
completes the collection cycle by performing the flip 
between the from-space and the to-space (60), acti- 
vates the mutators (62) and ends the collection cycle 
(64). 

[0023] The operation of looping over the records in a 
local buffer (steps 48 and 54 of Figure 3) is illustrated in 
Figure 4. After starting (70), the first address in the 
buffer is scanned (72). A location of the replica is deter- 
mined in the to-space and the contents of the updated 
location are copied from from-space into to-space (74). 
At this point, the collector determines whether the value 
copied in to-space is a pointer (76). If so, the refen^ed-to 
objects are scanned (78) and the pointer to the object is 
updated to refer to the new copy (80). If not, it is deter- 
mined whether the scanned record is the last one (82) 
so that the process is ended (84). If not, the collector 
scans the next record address in the local buffer (86) 
and performs again the same process for this record. 
Note that the scanning operation means updating the 
references to from-space into references to to-space 
and copying the referenced objects from from-space to 
to-space if not yet copied, as it is well known to those 
skilled in the art. 

[0024] It must be noted that the synchronization 
mechanism which handles the access to the global list 
of local buffers is an essential feature of the invention 
useful in two ways. On one hand, it manages the queue 
of buffers which must handle parallel updates. On the 
other hand, this synchronization makes sure that, when 
the collector gets the buffer to work on, its view is 
updated to contain all the memory modification reported 
by the records in this buffer, this is true since, when the 
mutator synchronizes to insert tiie buffer into the queue, 
its view already reflects all these modifications. When 
the collector later synchronizes to get this buffer, it gets 
updated with these nrnxJif ications as required. 
[0025] Although the invention has been described in 
reference to a preferred embodiment, it is understood 
fliat numerous changes may be resorted to by those 
skilled in tiie art witiiout departing from the scope of the 
invention. Thus , it would be possible to use several col- 
lectors running in parallel rather than a single collector. 
In such a case the synchronization problems would 
depend very much on the specific way chosen to imple- 
ment the parallel collection. 
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Claims 

1. In a multiprocessing system comprising a plurality 
of processors, a memory divided into a cunrent area 
(from-space) used by said processors during cur- 
rent program execution and a reserved area (to- 
space), and at least one garbage collector for per- 
forming when necessary a garbage collection con- 
sisting in flipping the roles of said current area and 
reserved area after all the live objects stored in said 
current area have been copied into said reserved 
area and for reclaiming said current area after the 
flipping operation, and wherein several program 
threads (mutators) or the like are currently running 
in parallel and said gart>age collector performs said 
gart>age collection in parallel with said program 
threads, the flipping operation being performed 
after said program threads have been stopped and 
said garbage collection has been completed ; 

an improved method of repiication-based gar- 
bage collection comprising the following steps : 

• during nomnal program execution, each 
program thread stores a record in a local 
buffer allocated tiiereto each time said pro- 
gram thread updates a memory location, 
and adds said local buffer when full to a 
global list of buffers using a first synchroni- 
zation operation, and 

• during garbage collection, said collector 
removes the local buffers one by one from 
said global list of buffers using a second 
synchronization operation, and loops over 
records in each removed local buffer and 
copies the updated memory locations into 
said reserved area until said global list is 
empty. 

2. The method according to claim 1 . wherein said syn- 
chronization operation is an instruction of wait-free 
synchronization performed witiiout blocking said 
program thread or said collector which initializes 
such an instruction whatever tiie result of said 
instruction. 

3. The method according to claim 2. wherein said 
wait-free synchronization instruction is of the type 
< < compare and swap )) instruction. 

4. The method according to claim 1, 2 or 3, further 
comprising the following steps after said program 
tiireads have been stopped 



If said global list is not empty, removing by said 
collector the new added buffers one by one 
from said global list, and looping over tiie 
records in each removed buffer and copying 
5 the updated memory locations into said 

reserved area until said global list is empty. 

5. The method according to claim 4, wherein said step 
of looping over records and copying tiie updated 

10 memory in said reserved area is also performed 
vintti all local buffers allocated to said program 
tiireads after said global list has been emptied. 

6. TTie metiiod according to any one of the preceding 
75 claims, wherein said step of looping over records in 

a local buffer consists in copying the contents of 
locations which have been updated from said cur- 
rent area to said reserved area. 

20 7. The mettiod according to claim 6. furtiier compris- 
ing tiie step of determining whetiier the value cop- 
ied in said reserved area is a pointer and if so. 
scanning ttie referred-to objects and updating said 
pointer. 



determining whether said global list contains ss 
other buffers which have been added to the 
global list during the removing of buffers by 
said collector, and 
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